4 research outputs found
Playing for Data: Ground Truth from Computer Games
Recent progress in computer vision has been driven by high-capacity models
trained on large datasets. Unfortunately, creating large datasets with
pixel-level labels has been extremely costly due to the amount of human effort
required. In this paper, we present an approach to rapidly creating
pixel-accurate semantic label maps for images extracted from modern computer
games. Although the source code and the internal operation of commercial games
are inaccessible, we show that associations between image patches can be
reconstructed from the communication between the game and the graphics
hardware. This enables rapid propagation of semantic labels within and across
images synthesized by the game, with no access to the source code or the
content. We validate the presented approach by producing dense pixel-level
semantic annotations for 25 thousand images synthesized by a photorealistic
open-world computer game. Experiments on semantic segmentation datasets show
that using the acquired data to supplement real-world images significantly
increases accuracy and that the acquired data enables reducing the amount of
hand-labeled real-world data: models trained with game data and just 1/3 of the
CamVid training set outperform models trained on the complete CamVid training
set.Comment: Accepted to the 14th European Conference on Computer Vision (ECCV
2016
NOVA: rendering virtual worlds with humans for computer vision tasks
Today, the cutting edge of computer vision research greatly depends on the availability of large datasets, which are critical for effectively training and testing new methods. Manually annotating visual data, however, is not only a labor-intensive process but also prone to errors. In this study, we present NOVA, a versatile framework to create realistic-looking 3D rendered worlds containing procedurally generated humans with rich pixel-level ground truth annotations. NOVA can simulate various environmental factors such as weather conditions or different times of day, and bring an exceptionally diverse set of humans to life, each having a distinct body shape, gender and age. To demonstrate NOVA's capabilities, we generate two synthetic datasets for person tracking. The first one includes 108 sequences, each with different levels of difficulty like tracking in crowded scenes or at nighttime and aims for testing the limits of current state-of-the-art trackers. A second dataset of 97 sequences with normal weather conditions is used to show how our synthetic sequences can be utilized to train and boost the performance of deep-learning based trackers. Our results indicate that the synthetic data generated by NOVA represents a good proxy of the real-world and can be exploited for computer vision tasks